Implementing a high performance tensor library
نویسنده
چکیده
Template methods have opened up a new way of building C++ libraries. These methods allow the libraries to combine the seemingly contradictory qualities of ease of use and uncompromising e ciency. However, libraries that use these methods are notoriously di cult to develop. This article examines the bene ts reaped and the di culties encountered in using these methods to create a friendly, high performance, tensor library. We nd that template methods mostly deliver on this promise, though requiring moderate compromises in either usability or e ciency.
منابع مشابه
cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs
We introduce the CUDA Tensor Transpose (cuTT) library that implements high-performance tensor transposes for NVIDIA GPUs with Kepler and above architectures. cuTT achieves high performance by (a) utilizing two GPU-optimized transpose algorithms that both use a shared memory buffer in order to reduce global memory access scatter, and by (b) computing memory positions of tensor elements using a t...
متن کاملNew implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations
This article presents an open-source object-oriented C++ library of classes and routines to perform tensor algebra.The primary purpose of the library is to enable post-Hartree–Fock electronic structure methods; however, the code is general enough to be applicable in other areas of physical and computational sciences. The library supports tensors of arbitrary order (dimensionality), size, and sy...
متن کاملAssessment of the Log-Euclidean Metric Performance in Diffusion Tensor Image Segmentation
Introduction: Appropriate definition of the distance measure between diffusion tensors has a deep impact on Diffusion Tensor Image (DTI) segmentation results. The geodesic metric is the best distance measure since it yields high-quality segmentation results. However, the important problem with the geodesic metric is a high computational cost of the algorithms based on it. The main goal of this ...
متن کاملHigh Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic
Researchers from diverse disciplines are increasingly incorporating numeric highorder data, i.e., numeric tensors, within their practice. Just like the matrix-vector (MV) paradigm, the development of multi-purpose, but high-performance, sparse data structures and algorithms for arithmetic calculations, e.g., those found in Einstein-like notation, is crucial for the continued adoption of tensors...
متن کاملEmpirical performance model-driven data layout optimization and library call selection for tensor contraction expressions
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined in ATLAS by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scientific Programming
دوره 11 شماره
صفحات -
تاریخ انتشار 2003